AITopics | state-action value function

Risk-Averse Total-Reward Reinforcement Learning

Neural Information Processing SystemsJun-19-2026, 08:13:26 GMT

Existing model-based algorithms for risk measures like the entropic risk measure (ERM) and entropic value-at-risk (EVaR) are effective in small problems, but require full access to transition probabilities. We propose a Q-learning algorithm to compute the optimal stationary policy for total-reward ERM and EVaR objectives with strong convergence and performance guarantees. The algorithm and its optimality are made possible by ERM's dynamic consistency and elicitability. Our numerical results on tabular domains demonstrate quick and reliable convergence of the proposed Q-learning algorithm to the optimal risk-averse value function.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

Neural Information Processing Systems

Country: North America > United States (0.28)

Genre: Research Report > Experimental Study (1.00)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.46)

Add feedback

0d17d033059bacd127f25ab28784f829-Paper-Conference.pdf

Neural Information Processing SystemsApr-24-2026, 18:50:03 GMT

data mining, machine learning, reinforcement learning, (15 more...)

Neural Information Processing Systems

Country:

North America (0.46)
Europe > Netherlands (0.28)

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.94)
Information Technology > Data Science > Data Mining (0.67)

Add feedback

2456a42386e445ba884511aa17ca4a30-Paper-Conference.pdf

Neural Information Processing SystemsFeb-7-2026, 22:15:03 GMT

machine learning, reinforcement learning, resq, (15 more...)

Neural Information Processing Systems

Country: Asia > China (0.14)

Genre: Research Report (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.99)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback

0d17d033059bacd127f25ab28784f829-Paper-Conference.pdf

Neural Information Processing SystemsFeb-7-2026, 19:24:42 GMT

confidence interval, international conference, kernel, (12 more...)

Neural Information Processing Systems

Country:

North America > Canada > Alberta (0.14)
Asia > Middle East > Jordan (0.04)
Europe > Netherlands > South Holland > Delft (0.04)
(4 more...)

Genre: Research Report > New Finding (0.54)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

ResQ: A Residual Q Function-based Approach for Multi-Agent Reinforcement Learning Value Factorization

Neural Information Processing SystemsDec-23-2025, 21:58:26 GMT

The factorization of state-action value functions for Multi-Agent Reinforcement Learning (MARL) is important. Existing studies are limited by their representation capability, sample efficiency, and approximation error. To address these challenges, we propose, ResQ, a MARL value function factorization method, which can find the optimal joint policy for any state-action value function through residual functions. ResQ masks some state-action value pairs from a joint state-action value function, which is transformed as the sum of a main function and a residual function. ResQ can be used with mean-value and stochastic-value RL. We theoretically show that ResQ can satisfy both the individual global max (IGM) and the distributional IGM principle without representation limitations. Through experiments on matrix games, the predator-prey, and StarCraft benchmarks, we show that ResQ can obtain better results than multiple expected/stochastic value factorization methods.

function-based approach, multi-agent reinforcement learning value factorization, resq, (7 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.68)

Add feedback

2456a42386e445ba884511aa17ca4a30-Paper-Conference.pdf

Neural Information Processing SystemsOct-2-2025, 23:48:20 GMT

machine learning, reinforcement learning, resq, (15 more...)

Neural Information Processing Systems

Country: Asia > China > Fujian Province > Xiamen (0.04)

Genre: Research Report (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.99)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback

InterQ: A DQN Framework for Optimal Intermittent Control

Aggarwal, Shubham, Maity, Dipankar, Başar, Tamer

arXiv.org Artificial IntelligenceApr-15-2025

In this letter, we explore the communication-control co-design of discrete-time stochastic linear systems through reinforcement learning. Specifically, we examine a closed-loop system involving two sequential decision-makers: a scheduler and a controller. The scheduler continuously monitors the system's state but transmits it to the controller intermittently to balance the communication cost and control performance. The controller, in turn, determines the control input based on the intermittently received information. Given the partially nested information structure, we show that the optimal control policy follows a certainty-equivalence form. Subsequently, we analyze the qualitative behavior of the scheduling policy. To develop the optimal scheduling policy, we propose InterQ, a deep reinforcement learning algorithm which uses a deep neural network to approximate the Q-function. Through extensive numerical evaluations, we analyze the scheduling landscape and further compare our approach against two baseline strategies: (a) a multi-period periodic scheduling policy, and (b) an event-triggered policy. The results demonstrate that our proposed method outperforms both baselines. The open source implementation can be found at https://github.com/AC-sh/InterQ.

artificial intelligence, machine learning, reinforcement learning, (20 more...)

arXiv.org Artificial Intelligence

2504.09035

Country: North America > United States > Illinois (0.28)

Genre: Research Report > New Finding (0.34)

Industry: Energy (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)

Add feedback

ResQ: A Residual Q Function-based Approach for Multi-Agent Reinforcement Learning Value Factorization

Neural Information Processing SystemsJan-24-2025, 23:29:39 GMT

The factorization of state-action value functions for Multi-Agent Reinforcement Learning (MARL) is important. Existing studies are limited by their representation capability, sample efficiency, and approximation error. To address these challenges, we propose, ResQ, a MARL value function factorization method, which can find the optimal joint policy for any state-action value function through residual functions. ResQ masks some state-action value pairs from a joint state-action value function, which is transformed as the sum of a main function and a residual function. ResQ can be used with mean-value and stochastic-value RL.

multi-agent reinforcement learning value factorization, resq, state-action value function, (4 more...)

Neural Information Processing Systems

Genre: Play > Prospect > Charge (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Deterministic Trajectory Optimization through Probabilistic Optimal Control

Filabadi, Mohammad Mahmoudi, Lefebvre, Tom, Crevecoeur, Guillaume

arXiv.org Artificial IntelligenceJul-18-2024

This article proposes two new algorithms tailored to discrete-time deterministic finite-horizon nonlinear optimal control problems or so-called trajectory optimization problems. Both algorithms are inspired by a novel theoretical paradigm known as probabilistic optimal control, that reformulates optimal control as an equivalent probabilistic inference problem. This perspective allows to address the problem using the Expectation-Maximization algorithm. We show that the application of this algorithm results in a fixed point iteration of probabilistic policies that converge to the deterministic optimal policy. Two strategies for policy evaluation are discussed, using state-of-the-art uncertainty quantification methods resulting into two distinct algorithms. The algorithms are structurally closest related to the differential dynamic programming algorithm and related methods that use sigma-point methods to avoid direct gradient evaluations. The main advantage of our work is an improved balance between exploration and exploitation over the iterations, leading to improved numerical stability and accelerated convergence. These properties are demonstrated on different nonlinear systems.

algorithm, iteration, trajectory, (17 more...)

arXiv.org Artificial Intelligence

2407.13316

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > New Jersey > Mercer County > Princeton (0.04)
(2 more...)

Genre: Research Report (0.50)

Industry:

Energy (0.34)
Health & Medicine (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Add feedback

A Distributed Approach to Autonomous Intersection Management via Multi-Agent Reinforcement Learning

Cederle, Matteo, Fabris, Marco, Susto, Gian Antonio

arXiv.org Artificial IntelligenceMay-14-2024

Autonomous intersection management (AIM) poses significant challenges due to the intricate nature of real-world traffic scenarios and the need for a highly expensive centralised server in charge of simultaneously controlling all the vehicles. This study addresses such issues by proposing a novel distributed approach to AIM utilizing multi-agent reinforcement learning (MARL). We show that by leveraging the 3D surround view technology for advanced assistance systems, autonomous vehicles can accurately navigate intersection scenarios without needing any centralised controller. The contributions of this paper thus include a MARL-based algorithm for the autonomous management of a 4-way intersection and also the introduction of a new strategy called prioritised scenario replay for improved training efficacy. We validate our approach as an innovative alternative to conventional centralised AIM techniques, ensuring the full reproducibility of our results. Specifically, experiments conducted in virtual environments using the SMARTS platform highlight its superiority over benchmarks across various metrics.

agent, algorithm, vehicle, (14 more...)

arXiv.org Artificial Intelligence

2405.08655

Country: